An Improvement Method for The Ambiguous Fragments Discovery in Chinese Word Segmentation
نویسندگان
چکیده
Disambiguation is a difficult task in Chinese Automatic Word Segmentation, and the ambiguous fragments discovery is the foundation of the disambiguation. This article proposes a method named Bidirectional Maximum Matching and Retroversion Multiword to discover the ambiguous fragments, which can deal with the overlapping ambiguity fragments of the long precision. Some experiments show that this method achieves better result than existing methods.
منابع مشابه
A Hybrid Approach to Word Segmentation and POS Tagging
In this paper, we present a hybrid method for word segmentation and POS tagging. The target languages are those in which word boundaries are ambiguous, such as Chinese and Japanese. In the method, word-based and character-based processing is combined, and word segmentation and POS tagging are conducted simultaneously. Experimental results on multiple corpora show that the integrated method has ...
متن کاملIntegrating Ngram Model and Case-based Learning for Chinese Word Segmentation
This paper presents our recent work for participation in the First International Chinese Word Segmentation Bakeoff (ICWSB-1). It is based on a generalpurpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model trai...
متن کاملA Chinese word segmentation based on language situation in processing ambiguous words
While the processing of natural language is beneficial to the text mining, Chinese word segmentation is an important step in the processing of Chinese natural language. In this paper, the convergence essence of the segmentation process is analyzed, and a theory of Chinese word segmentation based on language situation is deducted. Based on the segmentation theory, an algorithm of Chinese word se...
متن کاملChinese word segmentation and its effect on information retrieval
A set of IR experiments was carried out to study the impact of Chinese word segmentation and its effect on information retrieval (IR) at the Division of Information Studies, Nanyang Technological University, Singapore. A total of four automatic character-based segmentation approaches and a manual word segmentation approach was first carried out to obtain the word segments for indexing and to ev...
متن کاملExploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts
The application of text-to-speech (TTS) conversion has become widely used in recent years. Chinese TTS faces several unique difficulties. The most critical is caused by the lack of word delimiters in written Chinese. This means that Chinese word segmentation (CWS) must be the first step in Chinese TTS. Unfortunately, due to the ambiguous nature of word boundaries in Chinese, even the best CWS s...
متن کامل